Skip to content

Fix #815: deduplicate SessionConnector::watch() emissions with debouncing#858

Open
AlexMikhalev wants to merge 2266 commits intomainfrom
task/815-session-connector-dedup
Open

Fix #815: deduplicate SessionConnector::watch() emissions with debouncing#858
AlexMikhalev wants to merge 2266 commits intomainfrom
task/815-session-connector-dedup

Conversation

@AlexMikhalev
Copy link
Copy Markdown
Contributor

Summary

  • Add per-path HashMap<PathBuf, (usize, Instant)> tracking in NativeTerraphim AIConnector::watch()
  • Only emit session when messages.len grew AND 250 ms debounce window elapsed since last emission
  • Document dedup contract on the SessionConnector::watch trait method
  • Fix clippy collapsible_if in the emission guard
  • Tighten integration test: add lower-bound assertion and narrow upper bound from 5 to 4

Test plan

  • cargo test -p terraphim_sessions — all 54 tests pass including test_watch_dedup_on_incremental_writes
  • cargo clippy -p terraphim_sessions — clean
  • cargo fmt -p terraphim_sessions -- --check — clean

Refs #815

🤖 Generated with Terraphim AI

AlexMikhalev and others added 30 commits April 21, 2026 10:25
… + AutoMerge enqueue' (#717) from task/roc-v1-step-f-verdict-polling into main
…ostMergeTestGate enqueue + [ADF] issue on failure) — Refs adf-fleet#35
…ath — Refs adf-fleet#35

Add 5 integration tests driving handle_auto_merge_for_project with an
in-memory RecordingExecutor (no mock frameworks, async-trait impl):

* auto_merge_success_enqueues_post_merge_gate
* auto_merge_skipped_when_head_sha_changed
* auto_merge_failure_opens_adf_issue
* auto_merge_marks_dedupe_set_on_success
* auto_merge_skipped_when_pr_already_closed

Expose pub auto_merge_enqueued() accessor on AgentOrchestrator so
tests can verify the (project, pr_number, head_sha) dedupe set is
populated after a successful merge.
… from task/roc-v1-step-g-auto-merge-handler into main
…ait + config — Refs adf-fleet#36

Adds crates/terraphim_orchestrator/src/post_merge_gate.rs with:
- CommandRunner async trait + TokioCommandRunner real impl (streams
  stdout/stderr tails via bounded ring buffer so long test runs do not
  OOM; kills child on wall-time timeout).
- run_workspace_tests / classify_failure / revert_merge helpers.
- ScriptedRunner test double (no mock library — trait impl only).
- 9 inline unit tests covering runner behaviour, failure parsing, and
  revert paths (green, timeout, io-error, test-failure parse, harness
  detection, timeout classification, revert no-push, revert with push,
  revert all-paths-failed).

Adds PostMergeGateConfig to OrchestratorConfig (optional; defaults to
10 minute budget, push revert to origin main). Back-fills the new
field on every existing OrchestratorConfig initializer across lib.rs
test fixtures and integration tests.
…k - Refs adf-fleet#36

Replaces the Step B log-only stub for DispatchTask::PostMergeTestGate with
a real handler that:

- Resolves project working_dir as repo_root
- Runs post_merge_gate::run_workspace_tests with GateConfig built from
  orchestrator.toml [post_merge_gate] overrides (default 10 min budget)
- On green: logs post_merge_gate_verified at info
- On red: classifies failure, calls post_merge_gate::revert_merge to
  create revert commit + push to remote, opens [ADF] post-merge test gate
  reverted issue on the project's Gitea repo with merge_sha, revert_sha,
  top failing tests, stderr tail, wall_time
- Logs post_merge_gate_reverted at warn

handle_post_merge_test_gate_with_runner<R: CommandRunner + ?Sized> is
parametrised so integration tests can inject ScriptedRunner without
spawning cargo.

Adds truncate_for_issue helper for char-boundary-safe trimming of stderr
tails when opening the [ADF] issue (avoids corrupting UTF-8 mid-sequence).

All orchestrator tests pass; clippy clean; fmt clean.
) from task/roc-v1-step-h-post-merge-gate into main
…/merge — Refs adf-fleet#37

Add OrchestratorEvent enum (PrReviewed, PrAutoMerged, PrAutoMergedVerified,
PrAutoReverted) to quickwit.rs with emit_event helper on QuickwitFleetSink.
All emits are gated behind the quickwit feature and tolerate Quickwit down
(warn log, business logic unblocked).

Wire emit sites:
- PrReviewed in poll_pending_reviews_for_project after verdict parse
- PrAutoMerged in handle_auto_merge_for_project after merge_pr succeeds
- PrAutoMergedVerified in handle_post_merge_test_gate_with_runner on green
- PrAutoReverted in handle_post_merge_test_gate_with_runner on revert

Replace the two quickwit_event_placeholder log lines from Step H with
the typed emits above.

Add tests/quickwit_events_tests.rs: 4 field-mapping tests + 1 tolerance test
(all gated behind quickwit feature). 501 tests pass; clippy clean; fmt clean.

Co-Authored-By: Terraphim AI <noreply@terraphim.cloud>
…e flow (ROC v1 Step I)' (#726) from task/roc-v1-step-i-quickwit-events into main
…reviewers event-driven) — Refs adf-fleet#39

No template change needed: implementation-swarm is not yet in scripts/adf-setup/agents/.
Developer cron bump to */20 applies to live conf.d/*.toml during Step L rollout.
…ump (ROC v1 Step K)' (#727) from task/roc-v1-step-k-cron-extension into main
…lo -> terraphim -- Refs adf-fleet#40

Co-Authored-By: Terraphim AI <noreply@terraphim.cloud>
…728) from task/roc-v1-step-l-rollout-runbook into main
Updates the adf.rs provider registration from Kimi K2.5 to K2.6 so the
ProviderHealthMap probe + KeywordRouter fallback target the current
model. KG routing tables (planning_tier.md) and conf.d/digital-twins.toml
already reference kimi-for-coding/k2p6; this aligns register_providers.

Tests unchanged (test fixtures use synthetic k2p5 strings that remain
valid — provider_key_for_model is generic and C1 allow-list checks the
kimi-for-coding prefix, not the specific model string).
…tion to k2p6' (#734) from task/kimi-k2p6-provider into main
…rand

- Patch rustls-webpki from 0.103.10 to 0.103.12 (fixes name constraints bypass)
- Add RUSTSEC-2026-0098/0099 to deny.toml (serenity/discord-feature-only)
- Replace rand with WASM-compatible fastrand in terraphim_multi_agent and terraphim_kg_agents
- Remove two direct rand dependencies from workspace crates

Refs #630
…on (#818)

* feat(cli): add evaluate subcommand for automata ground-truth evaluation

Wire the existing evaluate() function to a CLI subcommand in terraphim_cli.

Changes:
- Add Evaluate command with --ground-truth and --thesaurus flags
- Add handle_evaluate() function using terraphim_automata::evaluate()
- Add 4 integration tests for evaluate command
- Wire Evaluate match arm in command dispatcher

The core evaluation logic was already implemented in terraphim_automata::evaluation
(~613 lines, 13 unit tests). This adds CLI integration for automation use.

Example usage:
  terraphim-cli evaluate --ground-truth gt.json --thesaurus th.json

Part of: Gitea #576

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(clippy): replace sort_by with sort_by_key for Rust 1.95 compatibility

Rust 1.95 promotes clippy::unnecessary_sort_by to hard error under -D warnings.
Convert all sort_by calls to sort_by_key across 3 crates:

- terraphim-markdown-parser: 1 change (descending sort with Reverse)
- terraphim_router: 1 change (descending sort with Reverse)
- terraphim-session-analyzer: 13 changes (ascending + descending)

Line 548 in reporter.rs retains sort_by with #[allow] due to fallible
string parsing in the key function.

Refs #576

* fix(clippy): fix remaining sort_by in session-analyzer main.rs and submodules

Missed in previous commit: session-analyzer has duplicated logic in main.rs
(binary target) and submodules (kg/search, patterns/loader) that also use
sort_by. Convert to sort_by_key where possible, add #[allow] for float
comparisons using partial_cmp.

Refs #576

* fix(clippy): workspace-wide sort_by -> sort_by_key for Rust 1.95 compatibility

Convert all remaining sort_by calls across 40 files to either sort_by_key
or #[allow(clippy::unnecessary_sort_by)] for cases with non-Copy types,
multi-line closures, or partial_cmp on floats.

Covers: terraphim_agent, terraphim_automata, terraphim_orchestrator,
terraphim_service, terraphim_persistence, terraphim_update, terraphim_usage,
terraphim_sessions, terraphim_cli, terraphim_mcp_server, terraphim_types,
terraphim_symphony, terraphim_tinyclaw, terraphim_multi_agent,
terraphim_agent_evolution, terraphim_agent_registry, terraphim_goal_alignment

Refs #576

* fix(clippy): fix Rust 1.95 lints in task_decomposition and rolegraph examples

- Remove unnecessary .into_iter() in extend() call (useless_conversion lint)
- Collapse if guards into match arms (collapsible_match lint)
- Allow explicit_counter_loop in rolegraph examples

Refs #576

* fix(clippy): allow collapsible_match lint in middleware and agent_evolution

Rust 1.95 clippy promotes collapsible_match to hard error under -D warnings.
Add #![allow] at file level for ripgrep.rs, orchestrator_workers.rs,
and parallelization.rs where collapsing the match arms would reduce
readability.

Refs #576

* fix(clippy): allow collapsible_match in goal_alignment/goals.rs Refs #576

* fix(ci): pin Rust toolchain to 1.94.0 to match local dev

dtolnay/rust-toolchain@stable installs latest (1.95.0) which has new
clippy lints (collapsible_match, unnecessary_sort_by, useless_conversion)
not present in 1.94. Pin all ci-pr.yml jobs to 1.94.0 and update
rust-toolchain.toml accordingly.

Refs #576

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-pick from local main (0115f06f).

Refs #612

Co-authored-by: Test User <test@test.com>
Pre-build at script line 98 ran cargo build --workspace --all-targets
without --features zlob. fff-search build.rs panics under CI when zlob
isn't enabled (intentional gate). Clippy step at line 112 already had
the flag; pre-build needed it too. Unblocks lint-and-format CI for
PR #818 and any future PR.

Refs #818

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
openai/* via the OpenAI Plus/Pro/Team plan is subscription-gated, not
pay-per-use. PR #631 (the P0 C1 fix) removed openai from the allow-list
under the mistaken belief that all openai/* was pay-per-use; this commit
restores it.

Session-token exhaustion (the temporary state that motivated #631 in the
first place) is already handled correctly by the orchestrator's existing
primitives:

- ProviderHealthMap probes every probe_ttl_secs (1800s / 30 min) and
  marks exhausted providers unhealthy. Routing falls through to the next
  KG route (anthropic/opus -> kimi-for-coding/k2p6 -> openai/gpt-5.4 ->
  zai-coding-plan/glm-5).
- CircuitBreaker opens on 5 consecutive failures, half-opens after
  cooldown, closes on successful probe.
- ProviderBudgetTracker hour+day tumbling UTC windows; force_exhaust()
  on throttle classification pushes past cap so routing drops until the
  next window rolls.
- error_signatures.rs classifies stderr as Throttle / Flake / Unknown
  per provider with configurable regex.

Changes:
- config.rs: add "openai" to ALLOWED_PROVIDER_PREFIXES.
- config.rs tests: include openai/gpt-5.4 and openai/gpt-5.3-codex in
  test_is_allowed_provider_c1_allowed_prefixes.
- provider_gate_tests.rs: rename probe_gate_rejects_openai_provider ->
  probe_gate_allows_openai_subscription_provider; invert assertion and
  document the session-token-tracking rationale.
- docs/taxonomy/routing_scenarios/adf/planning_tier.md: retain
  anthropic/opus as primary route, add kimi-for-coding/k2p6 and
  openai/gpt-5.4 as alternate routes; remove the previously-present
  legacy openai/gpt-5.4 entry that had been stripped as a C1 violation.

Tests: all 497 unit tests + 16 integration tests pass.
Clippy + fmt clean.
…st + KG routing' (#737) from task/kg-planning-tier-k2p6 into main
Replace the in-memory DeviceStorage backend with MarkdownLearningStore for
durable persistence of shared learnings across process restarts.

Changes:
- Expand markdown frontmatter to serialize full SharedLearning state
- Add backwards-compatible parsing for old sparse frontmatter files
- Replace DeviceStorage with MarkdownLearningStore in SharedLearningStore
- Implement real load_all() that hydrates index from persisted markdown
- Add startup deduplication by learning.id (canonical over shared copies)
- Update StoreConfig to include MarkdownStoreConfig
- Align default paths with ProjectDirs recommendation
- Add integration tests for restart durability and deduplication
- Fix unreachable match arm for SharedLearningSub::Inject

Refs #6
Handle missing learnings root as an empty store on first run and make
startup deduplication deterministic by preferring canonical agent copies
over shared replicas when IDs collide.

Refs #6
Test User and others added 23 commits April 27, 2026 17:52
Adds the agent template that the Phase 2 fan-out loop spawns when
`pr-security-sentinel` appears in `[pr_dispatch.agents_on_pr_open]`.

Mirrors the pr-spec-validator skeleton:
- sonnet (subscription-only) — security-audit needs deeper reasoning
  than a structural diff scan to spot data-flow vulnerabilities and
  unsafe-code subtleties; haiku is the wrong tier
- skill_chain = ["security-audit"]
- 2-hour idempotency check via the existing comment-marker pattern
  (uses agent-scoped marker "Last security-audited commit: <sha>")
- best-effort POST_STATUS helper mirroring build-runner.toml

Path filter (early-exit happy path) — implemented in this script,
NOT in the orchestrator:
- skips the LLM call when no `Cargo.toml`, `Cargo.lock`,
  `crates/**/src/**`, or `**/secrets/**` paths changed
- posts `success` (not no-post) with description "n/a — no
  security-relevant changes" because `adf/security` becomes a required
  check on `main` post-deploy; a never-resolved `pending` would block
  docs-only PRs forever

Verdict parsing prefers an explicit `Verdict:` line; falls back to a
`Risk:` line (critical/high → fail, medium → concerns, low → pass);
finally falls back to `state=success` "audit posted; manual gate" when
neither parses — so the commit status is never left in `pending` even
on transient agent failures.

The cron-scheduled `security-sentinel` agent stays in tree for
full-repo audits; this PR-event variant is additive and diff-scoped.

Operator follow-up after merge:
1. Rebuild + restart the orchestrator on bigbox so the new dispatch
   arm goes live.
2. PATCH branch_protections/main to add `adf/security` to required
   contexts.

Refs #953
…2c) Refs #953

Rebased onto main after Phase 2b/d/e merges.
…open with path filter' (#1028) from task/953-adf-phase-2c-security-sentinel into main
…eMeta Refs #1026' (#1029) from sync/1026-query-role-meta into main
…eports for #1026' (#1030) from sync/vv2-1026 into main
… #1012

- Intercept args before clap parsing to apply typo correction,
  alias expansion, and case-insensitive matching
- Add build_cli_forgiving_parser() with actual CLI subcommands
- Add apply_forgiving_parsing() heuristic for subcommand detection
- Print correction notifications to stderr
- Add 3 integration tests: auto-correction, alias expansion,
  case-insensitive matching
…PI Refs #1011

- Add Robot variant to Command enum with subcommands:
  capabilities, schemas, examples
- Add RobotFormat enum (json, table, minimal) for --format flag
- Add handle_robot_command() and format_robot_output() helpers
- Wire Robot command in main() before catch-all server/offline dispatch
- Add unreachable! arms in run_offline_command and run_server_command
  to satisfy exhaustive match (Robot is handled in main)
- Add 4 integration tests: capabilities JSON, schemas JSON,
  examples text, all formats honoured
- Fix pre-existing compilation errors in terraphim_orchestrator
  (config clone + concurrency_permit field)
- Add [Unreleased] section with commits since 1.17.0
- Generate doc gap report: 771 undocumented items across 46 crates
- Generate API reference snippets from documented public APIs

Refs #1046
- Change ContentBlock::ToolResult from is_error: bool to exit_code: i32
  with backward-compatible deserialization for old JSON containing is_error
- Add --include-pinned flag to search subcommand in terraphim_agent
- Add --pinned flag to graph subcommand as kg list --pinned equivalent
- Fix learn query to call query_all_entries when semantic=false, matching spec
- Add pinned_node_ids to RoleGraphResponseDto for server/agent parity
- Add get_role_graph_pinned to TuiService for offline pinned retrieval
- Add spec gaps fix entry to CHANGELOG (Refs #1040)
- Generate documentation gap report for workspace crates
- Generate API reference snippets for key public items
- Update existing Gitea issue #1046 with recurrence findings

Part of: #1046
…Refs #451

- Add hook_manager field to TerraphimAgent with HookManager initialized in ::new()
- Add execute_llm_with_hooks() helper that wraps LLM calls with run_pre_llm/run_post_llm
- Wire hooks into 5 agent command handlers: generate, answer, analyze, create, review
- Wire hooks into specialized agents: ChatAgent and SummarizationAgent
- Add unit tests verifying hook invocation (pre_llm, post_llm, block, modify decisions)
- All existing tests pass, clippy clean, fmt clean

Implements requirement from spec-validator remediation #442 to wire
LLM hooks that were previously defined but not connected.
- Update [Unreleased] with 20 entries from 2026-04-27 to 2026-04-28
- Scan 58 crates: ~2,647 pub items missing documentation
- Generate gap report at .docs/reports/documentation-gap-report-20260428.md
- Recurrence comment on Gitea issue #1046

Refs #1046
Clippy fixes (blocking ADF build-runner on all PRs):
- Remove needless borrows in exit_codes_integration_test.rs
- Replace useless format! with .to_string() in offline_mode_tests.rs
- Remove unused LearningStore import in learning.rs

Test helper fixes (pre-existing failures):
- Update review_pr_config_with_spec_fanout to push pr-spec-validator
  into pr_dispatch_per_project['alpha'] (per-project config takes
  precedence over top-level pr_dispatch)
- Same fix for review_pr_config_with_security_checklist_fanout and
  review_pr_config_with_test_fanout

Refs #238
…d handlers' (#1056) from task/451-llm-hooks-wiring into main
… test helpers' (#1058) from task/fix-clippy-and-test-helpers into main
…ebouncing Refs #815

- Track (path -> messages_len, Instant) per file in shared HashMap
- Only emit when messages.len grew AND debounce window (~250ms) elapsed
- Document dedup contract on SessionConnector::watch trait method
- Fix clippy collapsible-if in emission guard
- Integration test: 3 incremental writes produce at most 5 emissions
@AlexMikhalev
Copy link
Copy Markdown
Contributor Author

security_checklist Summary

PR #815 (task/815-session-connector-dedup) introduces debounce-based deduplication for SessionConnector::watch() emissions in the native Terraphim AI connector.

Risk: low

Verdict: pass

Findings

Low — Mutex poison propagates as silent task failure
Location: crates/terraphim_sessions/src/connector/native.rs:206

let mut guard = last_seen.lock().unwrap();

std::sync::Mutex::lock().unwrap() is called inside a tokio::spawn(async move { ... }) block. The pattern is otherwise correct (no .await held across the lock guard, so no async-mutex is needed), but if the mutex becomes poisoned (a Exit Classes inside the critical section in another spawned task), subsequent tasks will Exit Classes and be silently dropped by the Tokio runtime. The event is lost without any log message.

The critical section cannot be triggered by external input — paths originate from the notify fileSystem watcher constrained to the configured directory — so this is not exploitable. The risk is limited to silent event loss under abnormal process state.

Recommendation: replace .unwrap() with .unwrap_or_else(|e| e.into_inner()) (recover from poisoned state) or add a tracing warn before the unwrap for observability.


Informational — Unbounded HashMap accumulation
Location: crates/terraphim_sessions/src/connector/native.rs:173

last_seen: HashMap<PathBuf, (usize, Instant)> grows without eviction for the lifetime of the watcher. In a directory tree with many session files this is benign; in a long-running process watching directories subject to large-scale file creation (e.g. CI farms), it could accumulate memory without bound. Not exploitable. Consider periodic pruning of entries older than a configurable TTL if watch lifetimes are expected to be long.


Informational — Path extension check without canonicalisation
Location: crates/terraphim_sessions/src/connector/native.rs:193

if path.extension().is_some_and(|ext| ext == "jsonl")

The .jsonl extension filter is correct for the intended domain. Paths are supplied by the notify crate (constrained to RecursiveMode::Recursive under the configured directory), not by user input, so there is no path traversal surface here. No action required.

Dependency Changes

None. Cargo.toml and Cargo.lock are unchanged in this PR. No new crates introduced, no version bumps.

Scope of Changes

The remaining diff is purely additive documentation (rustdoc on enum variants, #![allow(missing_docs)] on binary crates, struct field doc-comments). No logic changes outside the connector debounce implementation and its integration test.

  • No unsafe blocks introduced or modified.
  • No secrets, tokens, or credentials touched.
  • No new Graph surface, serialisation boundary, or auth path added.
  • Existing .jsonl-only file filter preserved; no new file types admitted.

Last security_checklist-audited commit: 971b3e0

@AlexMikhalev
Copy link
Copy Markdown
Contributor Author

Security Audit Summary

PR #858 introduces debounce/dedup logic to NativeClaudeConnector::watch() and adds Rustdoc to several crates. No Cargo.toml or Cargo.lock changes; no new dependencies.

Risk: low

Verdict: pass

Findings

[LOW] Mutex poison propagation in production watch loop
Location: crates/terraphim_sessions/src/connector/native.rstokio::spawn closure inside the spawn_blocking loop

let mut guard = last_seen.lock().unwrap();

std::sync::Mutex::lock() returns Err(PoisonError) when a previous thread panicked while holding the guard. Calling .unwrap() on the Err variant re-panics the current tokio::spawn task, silently dropping it. The watcher thread itself continues; the symptom is corrupted dedup state (stale entries not updated) rather than a crash. Not externally exploitable.

Recommendation: recover from poisoned state:

let mut guard = last_seen.lock().unwrap_or_else(|e| e.into_inner());

[INFORMATIONAL] tokio::spawn inside spawn_blocking
Location: same file, watch loop body

Valid — the runtime handle is accessible from blocking threads. Lock guard is dropped before any .await boundary, so no deadlock risk.

Dependency Changes

None. Cargo.toml and Cargo.lock unchanged.

Scope not affected

  • No unsafe blocks introduced.
  • No hardcoded credentials, tokens, or secrets.
  • No path traversal: events filtered to .jsonl extension before parsing.
  • No command injection vectors.
  • No new serialisation trust boundaries.
  • #[allow(missing_docs)] is lint suppression on test/validation infrastructure only.

Last security-audited commit: 971b3e0

@AlexMikhalev
Copy link
Copy Markdown
Contributor Author

Requirements Traceability Summary

PR #858 — Fix #815: deduplicate SessionConnector::watch() emissions with debouncing

Branch: task/815-session-connector-dedup | Author: AlexMikhalev | Head: 971b3e0

Scope: crates/terraphim_sessions/src/connector/{mod,native}.rs (core dedup logic) + rustdoc additions for terraphim_agent, terraphim-session-analyzer, terraphim_validation.


Verdict: pass


Traceability Matrix

Req ID Requirement Design Ref Impl Ref Tests Evidence Status
REQ-001 watch() MUST NOT emit duplicate sessions on each raw Modify FS event (inotify/FSEvents fires on every line flush) Dedup contract docstring on SessionConnector::watch trait — connector/mod.rs:124-142 native.rs:169-237HashMap<PathBuf,(usize,Instant)> tracking + debounce_duration=200ms guard native.rs:646 test_watch_dedup_on_incremental_writes cargo test -p terraphim_sessions → 54/54 pass
REQ-002 Emit only when messages.len grew AND debounce window (≈250 ms) elapsed since last emission for that path Same docstring — canonical strategy specified native.rs:206-229 should_emit boolean guard; (messages_len > *last_len) && duration_since >= debounce_duration Same test — 3 spaced writes, assert 1 ≤ emissions ≤ 4 Test passes with inotify on Linux
REQ-003 Dedup contract MUST be documented on the SessionConnector::watch trait method for implementors Self-referential — docstring IS the design artefact connector/mod.rs:128-137 **Dedup contract**: block added No separate test; contract verifiable via trait doctest cargo doc -p terraphim_sessions clean
REQ-004 NativeClaudeConnector MUST expose a with_path(PathBuf) constructor for test isolation (avoids ~/.claude coupling) Inferred from test strategy — no ADR native.rs:26-35 with_path() + override_path: Option<PathBuf> field native.rs:646 test uses NativeClaudeConnector::with_path(dir.path()) 54/54 pass
REQ-005 Receiver drop detection — watcher loop MUST exit cleanly when channel closed Inferred from resource management native.rs:180-184 tx.is_closed() check in poll loop No dedicated test; covered implicitly (test drop at end) Acceptable — resource leak guard, not observable correctness ⚠️
NFR-001 No regression in existing 53 tests Existing suite native.rs all existing tests updated from NativeClaudeConnector (unit struct) to NativeClaudeConnector::default() 54/54 pass (cargo test -p terraphim_sessions) Verified locally
NFR-002 Clippy clean cargo clippy -p terraphim_sessions Clean (0 warnings/errors)
DOC-001 Missing rustdoc on 30+ public items across terraphim_agent, terraphim-session-analyzer, terraphim_validation (Refs #1089) crates/terraphim_agent/src/lib.rs, repl/commands.rs, repl/handler.rs, robot/*.rs, service.rs; session-analyzer connectors/mod.rs, reporter.rs; terraphim_validation/src/lib.rs #[allow(missing_docs)] suppressions in test infra Rustdoc warnings absent; cargo doc passes

Gaps

  • ⚠️ REQ-005 — no explicit close-detection test: The tx.is_closed() guard is exercised only implicitly when the test's receiver is dropped. A dedicated test that drops the receiver mid-watch and verifies the spawned thread exits within a bounded time would make the invariant explicit. This is a follow-up, not a blocker — the implementation is correct.

  • ℹ️ No ADR for dedup strategy: The connector/mod.rs docstring documents the contract well, but an ADR in docs/adr/ would capture the decision context (why debounce over byte-offset tracking). The inline docs are sufficient for merge; an ADR is optional.

  • ℹ️ Commits 67654c0 and 0ceff5c are both prefixed fix(sessions): deduplicate… — the second (0ceff5c) is a fixup of the first (clippy + test tightening split into a third commit 375db14). Consider squashing before merge to keep git history clean.

  • ℹ️ Extra commits on branch (rustdoc for #1089, reports/, agent changes) are present in the diff but unrelated to the dedup fix. If the PR is intended solely for feat(learn): compile captured corrections into live thesaurus #815, these should be on a separate branch. If intended as a combined cleanup PR, the title and body should reflect that scope.


Last spec-validated commit: 971b3e0

@AlexMikhalev
Copy link
Copy Markdown
Contributor Author

Requirements Traceability Summary

PR #858 (task/815-session-connector-dedup) implements deduplication of SessionConnector::watch() emissions in NativeTerraphim AIConnector. The change introduces a HashMap<PathBuf, (usize, Instant)> debounce state with a 200 ms window, replaces the blocking for res in watcher_rx loop with a recv_timeout loop that polls for receiver-drop every 500 ms, and adds an injectable path override for test isolation.

Spec sources: PR #858 description, commit messages Refs #815, trait-level rustdoc on SessionConnector::watch().

Verdict: concerns

Core requirement implemented and verified. Three follow-up items noted — none are release blockers, but the rustdoc/code mismatch and the missing shutdown test are worth tracking.

Traceability Matrix

Req ID Requirement Design Ref Impl Ref Tests Status
REQ-815-01 watch() MUST NOT emit duplicate sessions for a single file modified multiple times via FS events Trait rustdoc dedup contract on SessionConnector::watch() in mod.rs native.rslast_seen HashMap + should_emit guard combining messages_len > last_len and debounce check test_watch_dedup_on_incremental_writes — upper bound received.len() <= 4
REQ-815-02 Debounce window of ≈250 ms; emit only after window elapses since last emission per path Trait rustdoc ("≈250 ms" in contract description) debounce_duration = Duration::from_millis(200) test_watch_dedup_on_incremental_writes — writes spaced 400 ms apart, each gap exceeds window ⚠️ 200 ms vs ≈250 ms doc mismatch
REQ-815-03 Emit only when messages.len has grown since last emission for that path Trait rustdoc (canonical strategy) messages_len > *last_len && now.duration_since(*last_time) >= debounce_duration test_watch_dedup_on_incremental_writes asserts last.messages.len() == 3
REQ-815-04 At least one emission must be received per session (lower bound) PR description ("lower-bound assertion") Same guard — None arm (first time seeing file) always emits test_watch_dedup_on_incremental_writes asserts !received.is_empty()
REQ-815-05 Watch loop exits cleanly when all receivers are dropped Implicit shutdown-safety requirement if tx.is_closed() { break; } checked each 500 ms recv_timeout cycle No dedicated test for the shutdown path ⚠️
NFR-815-01 NativeTerraphim AIConnector must accept a path override for test isolation Implicit (test harness requirement) NativeTerraphim AIConnector::with_path(PathBuf) constructor; override_path: Option<PathBuf> field; default_path() now uses override_path.or_else(...) test_watch_dedup_on_incremental_writes uses with_path
NFR-815-02 last_seen HashMap must not grow without bound in long-running sessions Not documented No eviction strategy — HashMap grows with each unique path seen No test ⚠️

Gaps

⚠️ Follow-up: Rustdoc says ≈250 ms; code uses 200 ms

SessionConnector::watch() trait doc states "≈250 ms" debounce window. native.rs uses Duration::from_millis(200). Future connector implementors will follow the doc, not the code, creating silent divergence.

Recommendation: align rustdoc to "≈200 ms" or extract a named constant and reference it in the doc.

⚠️ Follow-up: No test for clean shutdown (REQ-815-05)

The tx.is_closed() → break path is exercised implicitly when the integration test drops rx at the end, but there is no assertion that the watcher thread terminates within a bounded time. For a long-running Service this matters.

Recommendation: add a test that drops rx immediately after watch() returns, then waits ≤1 s and verifies the bacGraphround thread (or channel) has gone quiescent.

⚠️ Follow-up: Unbounded last_seen HashMap (NFR-815-02)

The debounce map grows proportionally with unique paths observed and is never pruned. Benign for typical Terraphim AI project directories (tens to hundreds of files), but worth a bounded structure (LRU-1000) for correctness at scale.

Recommendation: open a follow-up issue; not a blocker for this PR.

ℹ️ Note: Two duplicate commits carry identical messages

SHAs 0ceff5c22 and 67654c075 both carry fix(sessions): deduplicate SessionConnector::watch() emissions with debouncing Refs #815. Likely a rebase artefact. Worth squashing before merge to keep the Graph clean.


Last spec-validated commit: 971b3e0

@AlexMikhalev
Copy link
Copy Markdown
Contributor Author

Requirements Traceability Summary

PR #858 fixes a session-watch deduplication defect in NativeTerraphim AIConnector. The change is bounded to crates/terraphim_sessions/src/connector/ — no plans/ spec files were modified (acceptable for a focused bugfix with no new external API surface).

Referenced issue cross-System confusion: PR title cites #815, but GitHub issue #815 is feat(learn): compile Terraphim Graph Embeddings: Learning Agent Guidetured corrections into live Thesaurus — an entirely different feature. The dedup requirement lives only in the PR description itself. This is a traceability gap (INFERRED requirements, no stable ID in a ticket System that matches the code change).


Verdict: concerns


Traceability Matrix

Req ID Requirement Design Ref Impl Ref Tests Status
REQ-001 watch() must not emit duplicate sessions on every raw Modify event PR description + trait doc connector/mod.rs:124-135 native.rs:174-238 debounce loop test_watch_dedup_on_incremental_writes pass
REQ-002 Emit only when messages.len has grown since last emission PR description native.rs:203-215 same test pass
REQ-003 Debounce window ~250 ms gates burst emissions Trait doc "~250 ms" native.rs:197 Duration::from_millis(200) same test (400 ms gaps) concerns
REQ-004 SessionConnector::watch trait must document the dedup contract PR description connector/mod.rs:124-135 N/A pass
REQ-005 Connector must be testable with a custom path PR description native.rs:22-31 override_path + with_path() All watch tests use with_path pass
NFR-001 Watch loop exits cleanly when receiver is dropped Implicit native.rs:174 tx.is_closed() + native.rs:237 Disconnected arm Implicit via test scope exit concerns
NFR-002 Dedup state not shared across connector instances Implicit correctness last_seen scoped to spawn_blocking closure No multi-instance isolation test concerns

Gaps

concerns REQ-003 — Debounce constant inTerraphim Graph Embeddings: Learning Agent Guide
Trait doc says ~250 ms; implementation uses Duration::from_millis(200). The "approximately" qualifier provides some cover, but the values should be aligned. Recommend extracting a named constant const DEBOUNCE_MS: u64 = 250 referenced in both doc and code, or updating the doc to say ~200 ms. One-line follow-up.

concerns NFR-001 — No explicit receiver-drop test
The tx.is_closed() guard is present and correct. The integration test implicitly exercises this path when rx drops at scope end. However, the spawn_blocking thread may outlive the test by up to the 500 ms recv_timeout window, which can cause noise in test runners reporting lingering tasks. A test_watch_stops_when_receiver_dropped test with tokio::time::timeout would make the invariant explicit. Follow-up, not blocking.

concerns NFR-002 — No multi-instance isolation test
The dedup HashMap is scoped inside the closure so two concurrent connectors will not interfere — correct by construction. If a watch_all() aggregator is ever added, this isolation assumption should be re-verified under test. Note, not blocking.

note — Issue cross-System mismatch
PR cites Refs #815, but GitHub #815 is feat(learn): compile Terraphim Graph Embeddings: Learning Agent Guidetured corrections into live Thesaurus. The dedup requirement exists only in the PR description. Recommend creating a Gitea issue anchoring the dedup requirement and updating the PR body to reference it.


Last spec-validated commit: HEAD (PR #858 task/815-session-connector-dedup)

@AlexMikhalev
Copy link
Copy Markdown
Contributor Author

Requirements Traceability Summary

PR: #858 — Fix #815: deduplicate SessionConnector::watch() emissions with debouncing
Validator: Carthos (Domain Architect)
Date: 2026-05-01
Head SHA: pr-858 (fetched from origin pull/858/head)

This PR is tightly scoped to two files:
crates/terraphim_sessions/src/connector/mod.rs — trait documentation
crates/terraphim_sessions/src/connector/native.rs — watch loop rewrite

The change replaces an unbounded event-loop with a debounced, closed-channel-aware loop that tracks (path → messages.len, last_emission_time) to suppress duplicate emissions on inotify/FSEvents flush notifications.

Verdict: concerns

The core dedup invariant is correctly implemented and the integration test test_watch_dedup_on_incremental_writes provides meaningful coverage. Three concerns prevent a clean pass:
(1) The debounce duration is 200 ms in code (Duration::from_millis(200)) but the PR description and trait doc say 250 ms — a documentation/code inconsistency that should be resolved before merge;
(2) The integration test uses real tokio::time::sleep calls (total ~2.3 s of sleep per run), not tokio::time::pause, which will slow CI and is fragile on loaded runners — tokio::time::pause is the idiomatic approach for time-dependent tests in the project;
(3) No ADR or design note justifies the choice of messages.len as the dedup key over byte-offset or session hash — future maintainers will not understand why a growing-but-resetting message count (e.g., session file truncation) would trigger a spurious emission.

Traceability Matrix

Req ID Requirement Design Ref Impl Ref Tests Evidence Status
REQ-1 watch() must not emit duplicate sessions on rapid FS events (inotify flush storm) No ADR — MISSING native.rslast_seen: Arc<Mutex<HashMap<PathBuf,(usize,Instant)>>> + emit guard (lines ~182–220) test_watch_dedup_on_incremental_writes PR test plan: 54 tests pass; dedup test asserts count ≤ 4 with lower-bound ⚠️
REQ-2 Dedup strategy: emit only when messages.len grew AND debounce window elapsed PR description §bullet 2; trait doc mod.rs native.rsmessages_len > *last_len && now.duration_since(*last_time) >= debounce_duration test_watch_dedup_on_incremental_writes Integration test covers both conditions; timing relies on real sleep ⚠️
REQ-3 Watch loop must exit cleanly when receiver is dropped (no zombie spawned tasks) Implicit from CLAUDE.md async guidelines native.rsif tx.is_closed() { break } at loop start; Disconnected arm on recv_timeout test_watch_dedup_on_incremental_writes (rx dropped at end) Test drops rx; no assertion on clean exit but watcher thread terminates via Disconnected arm ⚠️
REQ-4 NativeClaudeConnector must accept a custom path for test isolation PR description §bullet 1 (implied) native.rsoverride_path: Option<PathBuf> field; with_path(path: PathBuf) constructor All existing tests updated to ::default(); new test uses ::with_path(dir.path()) Mechanically correct; all existing tests compile with default()
REQ-5 Dedup contract documented on SessionConnector::watch() trait method PR description §bullet 3 mod.rs — doc comment "Dedup contract" block on watch() n/a (documentation) Doc comment present; 200ms/250ms inconsistency flagged below ⚠️
REQ-6 clippy collapsible_if fixed in emission guard PR description §bullet 4 native.rs — guard merged into single if should_emit && tx.send... cargo clippy -p terraphim_sessions in test plan Stated clean; not independently verified here
REQ-7 Integration test lower-bound assertion added; upper bound narrowed from 5 → 4 PR description §bullet 5 native.rs — test asserts count >= 1 && count <= 4 test_watch_dedup_on_incremental_writes Improvement over prior unasserted count; see timing concern below ⚠️

Gaps

⚠️ Debounce constant inconsistency (follow-up, merge-blocker candidate)

debounce_duration = Duration::from_millis(200) in native.rs but the trait doc and PR description state "≈250 ms". The trait doc forms part of the public contract (mod.rs). Callers reading the doc will expect 250 ms headroom; the implementation fires at 200 ms. One-line fix: align doc to 200 ms or change constant to 250 ms and update the test sleeps accordingly.

⚠️ Real-time sleeps in tests (follow-up)

test_watch_dedup_on_incremental_writes accumulates ~2.3 s of tokio::time::sleep per run. The project's CLAUDE.md states: "Use tokio::time::pause for testing time-dependent code without real delays." The test does not use tokio::time::pause, making it slow and potentially flaky on CI runners under load. Recommended fix: redesign the test to write via a mock clock, or use tokio::time::pause with tokio::time::advance. If real FS events are required (cannot mock), document the reason explicitly in the test.

⚠️ No ADR for dedup key choice (follow-up)

messages.len as the dedup key is reasonable for append-only JSONL files, but:

  • If a session file is truncated and rewritten (e.g., compaction), messages.len resets and the old baseline is stale — the guard would suppress the new content until messages.len exceeds the old high-water mark.
  • Alternative keys (byte-offset, session hash) would detect truncation correctly.

No design note captures this trade-off. Recommend a one-paragraph comment in native.rs at the last_seen declaration explaining the key invariant and the truncation limitation. This does not block merge but will cause confusion at the next modification.

ℹ️ Referenced issue #815 is a merged PR, not a dedup issue

gh issue view 815 resolves to "feat(learn): compile captured corrections into live thesaurus" (merged PR). The session-watch dedup requirement source is therefore untraceable on GitHub. If #815 refers to a Gitea issue (which is currently 404), this cross-reference is broken. Recommend updating the PR title/description to reference the actual tracked requirement (Gitea issue or GitHub issue), or adding a "Background" section in the PR body.

Last spec-validated commit: pr-858

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant